Principled Query Processing
نویسندگان
چکیده
This year, the SICS team decided to concentrate on query processing and on the internal topical structure of the query: we have identified this as one of the major bottlenecks for cross-lingual access systems. Previous years, the SICS team has investigated, among other issues, how to translate compounds. Compound translation is non-trivial due to dependencies between compound elements and has been treated in various ways in the treatment of compounding languages such as Swedish. We decided this year to investigate the topical dependencies between query terms, under the hypothesis that the complexity of translating compounds is a special case of the more general case of understanding the respective topicality of query terms. The question under investigation is how much each query term contributes in terms of topicality in the documents of the collection under consideration. If a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. Our base system is used with two different enhancements to test the hypothesis that boosting topically active terms is beneficial for retreival results. Both schemes are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms; the other using the likelihood of individual terms to appear topically in text. These are two different avenues of analysis and will most likely provide different results if pursued further than these initial experiments. The results of the boosting schemes delivered uncontroversially improved results. These results will provide impetus for the further study of translation of complex terms — the question which first prompted this set of experiments in the first place.
منابع مشابه
انتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملPoor Usability in Data Processing
The databases community works hard on the scale, performance, and correctness of the storage and query processing systems that our users depend on. Researchers are therefore frustrated to see less principled, and often incorrect 2010s implementations of concepts that were introduced in the 1970s. The lens of usability can help us understand how certain systems see adoption, regardless of the so...
متن کاملEfficient Processing of Ad-Hoc Top-k Aggregate Queries in OLAP
In this paper, we develop a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries in OLAP. Such queries provide the k groups with the highest aggregates to decision makers. Essential support of top-k aggregate queries is lacking in current RDBMSs, which process such queries in a naı̈ve and overkill materialize-group-sort scheme, therefore can be prohibitively ...
متن کاملHierarchical Dirichlet Trees for Information Retrieval
We propose a principled probabilisitc framework which uses trees over the vocabulary to capture similarities among terms in an information retrieval setting. This allows the retrieval of documents based not just on occurrences of specific query terms, but also on similarities between terms (an effect similar to query expansion). Additionally our principled generative model exhibits an effect si...
متن کاملFluXQuery: An Optimizing XQuery Processor for Streaming XML Data
XML has established itself as the ubiquitous format for data exchange on the Internet. An imminent development is that of streams of XML data being exchanged and queried. Data management scenarios where XQuery [11] is evaluated on XML streams are becoming increasingly important and realistic, e.g. in e-commerce settings. Naturally, query engines employed for stream processing are main-memory-ba...
متن کاملDistributed Query Monitoring through Convex Analysis: Towards Composable Safe Zones
Continuous tracking of complex data analytics queries over high-speed distributed streams is becoming increasingly important. Query tracking can be reduced to continuous monitoring of a condition over the global stream. Communication-efficient monitoring relies on locally processing stream data at the sites where it is generated, by deriving site-local conditions which collectively guarantee th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005